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FOCUSED INFORMATION CRITERION AND MODEL 
AVERAGING FOR GENERALIZED ADDITIVE PARTIAL LINEAR 

MODELS 
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CN ■ By Xinyu Zhang^ and Hua Liang^ 

^ ! Chinese Academy of Sciences and University of Rochester 

We study model selection and model averaging in generalized ad- 
QQ ' ditive partial linear models (GAPLMs). Polynomial spline is used to 

approximate nonparametric functions. The corresponding estimators 
of the linear parameters are shown to be asymptotically normal. We 
' then develop a focused information criterion (FIG) and a frequentist 

model average (FMA) estimator on the basis of the quasi-likelihood 
' principle and examine theoretical properties of the FIG and FMA. 

The major advantages of the proposed procedures over the existing 
ones are their computational expediency and theoretical reliability. 
Simulation experiments have provided evidence of the superiority of 
the proposed procedures. The approach is further applied to a real- 
world data example. 

. 1. Introduction. Generalized additive models, which are a generalization 

Tij- I of the generalized models and involve a summand of one-dimensional non- 

parametric functions instead of a summand of linear components, have been 
. widely used to explore the complicated relationships between a response to 

treatment and predictors of interest [Hastie and Tibshirani (1990)]. Vari- 
ous attempts are still being made to balance the interpretation of gener- 
alized linear models and the flexibility of generalized additive models such 
as generalized additive partial linear models (GAPLMs), in which some of 
^ ■ the additive component functions are linear, while the remaining ones are 

i modeled nonparametrically [Hardle et al. (2004a, 2004b)]. A special case of a 
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GAPLM with a single nonparametric component, the generalized partial lin- 
ear model (GPLM), has been well studied in the literature; see, for example, 
Severini and Staniswalis (1994), Lin and Carroll (2001), Hunsberger (1994), 
Hunsberger et al. (2002) and Liang (2008). The profile quasi-likelihood pro- 
cedure has generally been used, that is, the estimation of GPLM is made 
computationally feasible by the idea that estimates of the parameters can 
be found for a known nonparametric function, and an estimate of the non- 
parametric function can be found for the estimated parameters. Severini 
and Staniswalis (1994) showed that the resulting estimators of the param- 
eter are asymptotically normal and that estimators of the nonparametric 
functions are consistent in supremum norm. The computational algorithm 
involves searching for maxima of global and local likelihoods simultaneously. 
It is worthwhile to point out that studying GPLM is easier than studying 
GAPLMs, partly because there is only one nonparametric term in GPLM. 
Correspondingly, implementation of the estimation for GPLM is simpler 
than for GAPLMs. Nevertheless, the GAPLMs are more flexible and useful 
than GPLM because the former allow several nonparametric terms for some 
covariates and parametric terms for others, and thus it is possible to explore 
more complex relationships between the response variables and covariates. 
For example, Shiboski (1998) used a GAPLM to study AIDS clinical trial 
data and Miiller and Ronz (2000) used a GAPLM to carry out credit scor- 
ing. However, few theoretical results are available for GAPLMs, due to their 
general flexibility. In this article, we shall study estimation of GAPLMs us- 
ing polynomial spline, establish asymptotic normality for the estimators of 
the linear parameters and develop a focused information criterion (FIG) for 
model selection and a frequentist model averaging (FMA) procedure in con- 
struction of the confidence intervals for the focus parameters with improved 
coverage probability. 

We know that traditional model selection methods such as the Akaike 
information criterion [AIC, Akaike (1973)] and the Bayesian information 
criterion [BIC, Schwarz (1978)] aim to select a model with good overall 
properties, but the selected model is not necessarily good for estimating 
a specific parameter under consideration, which may be a function of the 
model parameters; see an inspiring example in Section 4.4 of Claeskens and 
Hjort (2003). Exploring the data set from the Wisconsin epidemiologic study 
of diabetic retinopathy, Claeskens, Groux and van Kerckhoven (2006) also 
noted that different models are suitable for different patient groups. This 
occurrence has been confirmed by Hand and Vinciotti (2003) and Hansen 
(2005). Motivated by this concern, Claeskens and Hjort (2003) proposed 
a new model selection criterion, FIG, which is an unbiased estimate of the 
limiting risk for the limit distribution of an estimator of the focus parameter, 
and systematically developed a general asymptotic theory for the proposed 
criterion. More recently, FIG has been studied in several models. Hjort and 
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Claeskens (2006) developed the FIC for the Cox hazard regression model and 
applied it to a study of skin cancer; Claeskens, Croux and van Kerckhoven 
(2007) introduced the FIC for autoregressive models and used it to predict 
the net number of new personal life insurance policies for a large insurance 
company. 

The existing model selection methods may arrive at a model which is 
thought to be able to capture the main information of the data, and to 
be decided in advance in data analysis. Such an approach may lead to the 
ignoring of uncertainty introduced by model selection. Thus, the reported 
confidence intervals are too narrow or shift away from the correct location, 
and the corresponding coverage probabilities of the resulting confidence in- 
tervals can substantially deviate from the nominal level [Danilov and Magnus 
(2004) and Shen, Huang and Ye (2004)]. Model averaging, as an alternative 
to model selection, not only provides a kind of insurance against select- 
ing a very poor model, but can also avoid model selection instability [Yang 
(2001) and Leung and Barron (2006)] by weighting/smoothing estimators 
across several models, instead of relying entirely on a single model selected 
by some model selection criterion. As a consequence, analysis of the dis- 
tribution of model averaging estimators can improve coverage probabilities. 
This strategy has been adopted and studied in the literature, for example. 
Draper (1995), Buckland, Burnham and Augustin (1997), Burnham and An- 
derson (2002), Danilov and Magnus (2004) and Leeb and Postcher (2006). 
A seminal work, Hjort and Claeskens (2003), developed asymptotic distri- 
bution theories for estimation and inference after model selection and model 
averaging across parametric models. See Claeskens and Hjort (2008) for a 
comprehensive survey on FIC and model averaging. 

FIC and FMA have been well studied for parametric models. However, 
few efforts have been made to study FIC and FMA for semiparametric mod- 
els. To the best of our knowledge, only Claeskens and Carroll (2007) stud- 
ied FMA in semiparametric partial linear models with a univariate non- 
parametric component. The existing results are hard to extend directly to 
GAPLMs, for the following reasons: (i) there exist nonparametric compo- 
nents in GAPLMs, so the ordinary likelihood method cannot be directly 
used in estimation for GAPLMs; (ii) unlike the semiparametric partial lin- 
ear models in Claeskens and Carroll (2007), GAPLMs allow for multivari- 
ate covariate consideration in nonparametric components and also allow for 
the mean of the response variable to be connected to the covariates by a 
link function, which means that the binary/count response variable can be 
considered in the model. Thus, to develop FIC and FMA procedures for 
GAPLMs and to establish asymptotic properties for these procedures are 
by no means straightforward to achieve. Aiming at these two goals, we first 
need to appropriately estimate the coefficients of the parametric components 
(hereafter, we call these coefficients "linear parameters"). 
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There are two commonly used estimation approaches for GAPLMs: the 
first is local scoring backfitting, proposed by Buja, Hastie and Tibshirani 
(1989); the second is an application of the marginal integration approach on 
the nonparametric component [Linton and Nielsen (1995)]. However, the- 
oretical properties of the former are not well understood since it is only 
defined implicitly as the limit of a complicated iterative algorithm, while 
the latter suffers from the curse of dimensionality [Hardle et al. (2004a)], 
which may lead to an increase in the computational burden and which also 
conflicts with the purpose of using a GAPLM, that is, dimension reduction. 
Therefore, in this article, we apply polynomial spline to approximate non- 
parametric functions in GAPLMs. After the spline basis is chosen, the non- 
parametric components are replaced by a linear combination of spline basis, 
then the coefficients can be estimated by an efficient one-step maximizing 
procedure. Since the polynomial-spline-based method solves much smaller 
systems of equations than kernel-based methods that solve larger systems 
(which may lead to identifiability problems), our polynomial-spline-based 
procedures can substantially reduce the computational burden. See a sim- 
ilar discussion about this computational issue in Yu, Park and Mammen 
(2008), in the generalized additive models context. 

The use of polynomial spline in generalized nonparametric models can be 
traced back to Stone (1986), where the rate of convergence of the polynomial 
spline estimates for the generalized additive model were first obtained. Stone 
(1994) and Huang (1998) investigated the polynomial spline estimation for 
the generalized functional ANOVA model. In a widely discussed paper. Stone 
et al. (1997) presented a completely theoretical setting of polynomial spline 
approximation, with applications to a wide array of statistical problems, 
ranging from least-squares regression, density and conditional density esti- 
mation, and generalized regression such as logistic and Poisson regression, 
to polychotomous regression and hazard regression. Recently, Xue and Yang 
(2006) studied estimation in the additive coefficient model with continuous 
response using polynomial spline to approximate the coefficient functions. 
Sun, Kopciuk and Lu (2008) used polynomial spline in partially linear single- 
index proportional hazards regression models. Fan, Feng and Song (2009) 
applied polynomial spline to develop nonparametric independence screening 
in sparse ultra-high-dimensional additive models. Few attempts have been 
made to study polynomial spline for GAPLMs, due to the extreme technical 
difficulties involved. 

The remainder of this article is organized as follows. Section 2 sets out 
the model framework and provides the polynomial spline estimation and 
asymptotic normality of estimators. Section 3 introduces the FIG and FMA 
procedures and constructs confidence intervals for the focus parameters on 
a basis of FMA estimators. A simulation study and real- world data analysis 
are presented in Sections 4 and 5, respectively. Regularity conditions and 
technical proofs are presented in the Appendix. 
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2. Model framework and estimation. We consider a GAPLM where the 
response Y is related to covariates X = {Xi, . . . , Xp)"^ G RP and Z = (Zi, . . . , 
Zd)"^ G R"^. Let the unknown mean response u(x, z) = E{Y\X. = x, Z = z) 
and the conditional variance function be defined by a known positive func- 
tion V, var(y|X = X, Z = z) = y{u(x, z)}. In this article, the mean function 
u is defined via a known link function g by an additive linear function 

p 

(2.1) 5{u(x,z)} = ^r/«(x^) + zT/3, 

«=i 

where Xa is the ath element of x, /3 is a d-dimensional regression parameter 
and the rj^s are unknown smooth functions. To ensure identifiability, we 
assume that E{r]aiXa)} = for 1 < a < p. 

Let /3 = (PJtPu)'^ be a vector with d = dc + du components, where /3c 
consists of the first dc parameters of /3 (which we certainly wish to be in 
the selected model) and consists of the remaining du parameters (for 
which we are unsure whether or not they should be included in the selected 
model). In what follows, we call the elements of z corresponding to Pc and 
l^u the certain and exploratory variables, respectively. As in the literature 
on FIG, we consider a local misspecification framework where the true value 
of the parameter vector P is Pq = (/3Jq, (5'^/y^)"'", with 6 being a du x 1 
vector; that is, the true model is away from the deduced model with a 
distance 0{l/^/n). This framework indicates that squared model biases and 
estimator variances are both of size 0(l/n), the most possible large-sample 
approximations. Some arguments related to this framework appear in Hjort 
and Claeskens (2003, 2006). 

Denote by Ps = Wj^f^us)"^ parameter vector in the Sth. submodel, in 
the same sense as P, with Pu,s being a (iu^g-subvector of Pu- Let tts be the 
projection matrix of size du,s x du mapping Pu to Pu,s- With du exploratory 
covariates, our setup allows 2*^" extended models to choose among. However, 
it is not necessary to deal with all 2*^" possible models and one is free to 
consider only a few relevant submodels (unnecessarily nested or ordered) 
to be used in the model selection or averaging. A special example is the 
James-Stein-type estimator studied by Kim and White (2001), which is a 
weighted summand of the estimators based on the reduced model {du^s = 0) 
and the full model {du^s = du)- So, the covariates in the 5th submodel are X 
and n^Z, where 11^ = diag(/rf^,7r5). To save space, we generally ignore the 
dimensions of zero vectors/matrices and identity matrices, simply denoting 
them by and /, respectively. If necessary, we will write their dimensions 
explicitly. In the remainder of this section, we shall investigate polynomial 
spline estimation for (/3Jq,0) based on the 5th submodel and establish a 
theoretical property for the resulting estimators. 

Let r/o = Yla=i^o,a{xa) be the true additive function and the covariate 
Xa be distributed on a compact interval [oaj^a]- ^Vithout loss of generality, 
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we take all intervals [oq, , feo, ] = [0, 1] for a = 1, . . . ,p. Noting (A. 7) in Ap- 
pendix A. 2, under some smoothness assumptions in Appendix A.l, r]Q can 
be well approximated by spline functions. Let 5„ be the space of polynomial 
splines on [0, 1] of degree ^ > 1. We introduce a knot sequence with J interior 
knots, k-g = • • • = A;_i = /co = 0</ci<---<A;j<l = kj+i = • • • = kj^g+i, 
where J = increases when sample size n increases and the precise order 
is given in condition (C6). Then, Sn consists of functions ? satisfying the 
following: 

(i) ? is a polynomial of degree q on each of the subintervals [fcj, fcj+i), 
j = 0, . . . , J„ — 1, and the last subinterval is [kj^, 1]; 

(ii) for Q>2^q\s{Q — l)-times continuously differentiable on [0, 1]. 

For simplicity of proof, equally spaced knots are used. Let h = 1/(J„ + 1) be 
the distance between two consecutive knots. 

Let (yi,Xj,Zj), 2 = l,...,n, be independent copies of (y, X, Z). In the 
5th submodel, we consider the additive spline estimates of r/o based on 
the independent random sample (5^, Xj, n^Zj), i = l,...,n. Let Qn be the 
collection of functions rj with the additive form ?/(x) = Yl^a=i^a{xa)-, where 
each component function rja £ Sn- 

We would like to find a function r] £0n and a value of (3g that maximize 
the quasi-likelihood function 

1 " 

(2.2) L{ri,(3s) = -Y,Qi9~'M^i) + {^s'Ziff3s},Yi\, V^Qn, 

i=l 

where Q{m,y) is the quasi-likelihood function satisfying ^^g^'^^ = y(m) • 

For the ath covariate x^, let bj^a{xa) be the B-spline basis function of 
degree q. For any rj £ Qn, one can write 7?(x) = 7'^b(x), where b(x) = 
{bj^a{xa),j = —Q, • • • , Jn,Oi = 1, . . . are the spline basis functions and 
7 = {7j,Q, J = —g, . . . , Jn, a = 1, . . . ,p}^ is the spline coefficient vector. Thus, 
the maximization problem in (2.2) is equivalent to finding values of and 
7* that maximize 

1 " 

(2.3) - ^Qb^H7*^b(X,) + {UsZifp*s},Yi\. 

We denote the maximizers as and 7^ = {7^ ^, j = —£»,..., J„,, a = 1, ... , 
p}"^. The spline estimator of ryo is then rfg = 7^'''b(x) and the centered spline 
estimators of each component function are 
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The above estimation approach can be easily implemented with commonly 
used statistical software since the resulting model is a generalized linear 
model. 

For any measurable functions ipi, (p2 on [0, 1]^, define the empirical inner 
product and the corresponding norm as 

n n 

(</?l,V'2)n =ra~^^{V3l(Xi)(/92(Xi)}, \\ip\\l=n^'^J2'P^(^i)- 
i=l i=l 

If tpi and ip2 are L^-integrable, define the theoretical inner product and 
the corresponding norm as (^2) = -E'{v'i(X)(/32(X)}, ||(/9||2 = i?v3^(X), re- 
spectively. Let 1 1 97 1 1 and ll'/'Ula be the empirical and theoretical norms, 
respectively, of a function ip on [0, 1], that is, 

n „1 

-^Y.^\Xio.)^ M\l^ = E^\X^)= / . 
i=i -^0 

where fai^a) is the density function of Xa- 

Define the centered version spline basis for any a = l,...,p and j = 

-Q+l,. . . ,Jn, b*^^{Xa) = bj^a{Xa) - \\bj,a\\2a/\\bj-l,a\\2abj-l,aiXa), with the 

standardized version given by 



(2.4) Bj^g{Xg) 



\b*^J\2a 



Note that to find (t*,/?^) that maximizes (2.3) is mathematically equivalent 
to finding {^,Ps) that maximizes 

1 " 

(2.5) £(7, /3s) = -5]gb-H7^B(Xi) + {UsZ,ff3s},Y^, 



n 



where B(x) = {Bj^a{xa)d = —Q + 1, • • • 1 Jn^Oi = 1, . . . Similarly to /3J, 

7^, rfg and rfg ^, we can define Pg, 75, rjs and the centered spline estima- 
tors of each component function %,a(xa). In practice, the basis {bj^aixa),j = 
— g,...,Jn,a = l,... is used for data analytic implementation and the 
mathematically equivalent expression (2.4) is convenient for asymptotic deriva- 
tion. 

Let pi{m) = {^^E^y/v{g-^m)}, 1 = 1,2. Write T = (X^, zT)T, ^^(t) = 
?7o(X) + Z'^/3o and e = Y — g^^{mo{T)}. Tj, mo(Tj) and are defined in 
the same way after replacing X, Z and T by Xj, Zj and Tj, respectively. 
Write 

ii;[Zpi{mo(T)}|X = x] 
rx)= ,rr\\\^r T' V'T) = Z-rX), 



Gr 



1 " 

^V6,pi{mo(T,)}^(T,), D = i?[pi{mo(T)}V(T){V^(T)}^ 
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and 5] = £;[pf{mo(T)}e2v,(T){^(T)}T]. 

The following theorem shows that the estimators Ps on the basis of the 
S'th submodel are asymptotically normal. 



Theorem 1. Under the local misspecification framework and conditions 
(Cl)-(Cll) in the Appendix, 

= -(UsT>U§)-^UsGn + (n5Dn|)-insD (^°^ + Op{l) 

A -(n5Dn|)-in5G + (n5Dn|)-in5D 

with G„ ^ G ~ iV(0, where " denotes convergence in distribution. 

Remark 1. If the link function g is identical and there is only one 
nonparametric component (i.e., p= 1), then the result of Theorem 1 will 
simplify to those of Theorems 3.1-3.4 of Claeskens and Carroll (2007) under 
the corresponding submodels. 

Remark 2. Assume that du = 0- Theorem 1 indicates that the polynomial- 
spline-based estimators of the linear parameters are asymptotically normal. 
This is the first explicitly theoretical result on asymptotic normality for es- 
timation of the linear parameters in GAPLMs and is of independent interest 
and importance. This theorem also indicates that although there are sev- 
eral nonparametric functions and their polynomial approximation deduces 
biases for the estimators of each nonparametric component, these biases do 
not make the estimators of /3 biased under condition (C6) imposed on the 
number of knots. 




3. Focused information criterion and frequentist model averaging. In 

this section, based on the asymptotic result in Section 2, we develop an FIC 
model selection for GAPLMs, an FMA estimator, and propose a proper 
confidence interval for the focus parameters. 

3.1. Focused information criterion. Let = /^(/^o) = /^(/3c,0) "^/y^) be 
a focus parameter. Assume that the partial derivatives of //(/3o) are con- 
tinuous in a neighborhood of /3c,o- Note that, in the 5th submodel, hq 

can be estimated by lis = Od^xd„]n|^5, [O^^xd^, -^d^lnlAs)- We now 
show the asymptotic normality of fis- Write Rg = n5^(nsDn^)~^n5, /Xc = 

— ap; — l/3c=/3c,o,/3n=o, l^u - — op:;^ — |/3,=/3,^o,/3«=o and ^Ip -[f^ajfJ-u) ■ 
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Theorem 2. Under the local misspecification framework and conditions 
(Cl)-(Cll) in the Appendix, we have 

v^(fe-/io) = -//jR5G„ + /iJ(R5D-/) (^°^ +Op(l) 

-^As = -/ijR^G + /xJ(R5D - ^) ( ° ) • 
Recall G ~ N{0,'E). A direct calculation yields 
(3.1) ii;(A|) = /.j|R5SR5 + (R5D-/)(^°) (^°)^(R5D-/) 



(3.2) 



Let 6 be the estimator of 5 by the full model. Then, from Theorem 1, we 
know that 

5 = -[0,I]B-^Gn + 5 + Op{l). 

If we define A = -[0, /]D-iG + <5 ~ N{6, [0, IjU^^-EB"^ [0, if), then 6^ 
A. Following Claeskens and Hjort (2003) and (3.1), we define the FIC of the 
S'th submodel as 

FIC5 = ff |r55]R5 + (RsD - ^) (I) (I) - If 

-,R.D-/)(» ,°Jd-ed-.(2 ,°J(asD-/r} 

which is an approximately unbiased estimator of the mean squared error 
when ^/nf^o is estimated by ^/njls. This FIC can be used for choosing a 
proper submodel relying on the parameter of interest. 

3.2. Frequentist model averaging. As mentioned previously, an average 
estimator is an alternative to a model selection estimator. There are at least 
two advantages to the use of an average estimator. First, an average estima- 
tor often reduces mean square error in estimation because it avoids ignoring 
useful information from the form of the relationship between response and 
covariates and it provides a kind of insurance against selecting a very poor 
submodel. Second, model averaging procedures can be more stable than 
model selection, for which small changes in the data often lead to a sig- 
nificant change in model choice. Similar discussions of this issue appear in 
Bates and Granger (1969) and Leung and Barron (2006). 

By choosing a submodel with the minimum value of FIC, the FIC estima- 
tors of /i can be written as /Ific = 5 1 (FIC selects the 5th submodel) /z^, 
where I(-), an indicator function, can be thought of as a weight function 
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depending on the data via 6, yet it just takes value either or 1. To smooth 
estimators across submodels, we may formulate the model average estimator 
of as 

(3.3) Jl = ^w{S\6)fis, 

s 

where the weights w{S\6) take values in the interval [0,1] and their sum 
equals 1. It is readily seen that smoothed AIC, BIC and FIC estimators 
investigated in Hjort and Claeskens (2003) and Claeskens and Carroll (2007) 
share this form. The following theorem shows an asymptotic property for the 
general model average estimators fi defined in (3.3) under certain conditions. 

Theorem 3. Under the local misspecification framework and conditions 
(Cl)-(Cll) in the Appendix, if the weight functions have at most a countable 
number of discontinuities, then 

V^ifl-fio) = -/ijD-iG„ + /ij|Q(?)(^|) - (^|)|+Op(l) 

A A = -^jD-i G + /. J { Q( A) ( ) - ( ) } , 

where Q{-) = ^^u;(s|-)R5D and A is defined in Section 3.1. 

Referring to the above theorems, we construct a confidence interval for 
/i based on the model average estimator fi, as follows. Assume that k? is a 
consistent estimator of /ijD~^SD~^//^. It is easily seen that 

V^(/2 - /.o) - S^Q(6) (I) -(?)}] ^ ^(0' 
If we define the lower bound {lown) and upper bound (wp„) by 

(3.4) ^ _ j^o^ _ j^o^ 1 1^,^^^-^!^^ 

where Zj is the jth standard normal quantile, then we have Prj/ig G (loWn, 
up^)} —7- 2^{zj) — 1, where $(•) is a standard normal distribution function. 
Therefore, the interval {loWn,up^) can be used as a confidence interval for 
Ho with asymptotic level 2^>(zj) — 1. 

Remark 3. Note that the limit distribution of \/n(jl — /Uq) is a nonlin- 
ear mixture of several normal variables. As argued in Hjort and Claeskens 
(2006), a direct construction of a confidence interval based on Theorem 3 
may not be easy. The confidence interval based on (3.4) is better in terms 
of coverage probability and computational simplicity, as promoted in Hjort 
and Claeskens (2003) and advocated by Claeskens and Carroll (2007). 
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Remark 4. A referee has asked whether the focus parameter can de- 
pend on the nonparametric function rjQ. Our answer is "yes." For instance, 
we consider a general focus parameter, r/o(x) + /iq, a summand of fiQ, which 
we have studied, and a nonparametric value at x. We may continue to get 
an estimator of ?/o(x) + /Uq by minimizing (3.2) and then model-averaging 
estimators by weighting the estimators of fiQ and r/o as in (3.3). However, 
the underlying FMA estimators are not root-n consistent because the bias 
of these estimators is proportional to the bias of the estimators of rjo , which 
is larger than n~^/^, whereas we can establish their rates of convergence 
using easier arguments than those employed in the proof of Theorem 3. 
Even though the focus parameters generally depend on a-nd r/o of form 
H{no,i]o) for a given function H{-,-), the proposed method can be still ap- 
plied. However, to develop asymptotic properties for the corresponding FMA 
estimators depends on the form of H(-, •) and will require further investiga- 
tion. We omit the details. Our numerical studies below follow these proposals 
when the focus parameters are related to the nonparametric functions. 

4. Simulation study. We generated 1000 data sets consisting of n = 200 
and 400 observations from the GAPLM 

logit{Pr(y, = 1)} = r?i(X,,i) + ??2(X,,2) + Zjp 

= sin(27rXi,i) + 5X^^2 + 3X^2 - 2 + Zjf3, i = 1, . . . , n, 

where: the true parameter /? = {1.5, 2, ro(2, 1, 3)/-v/n}^; Xj^i and Xj^2 are 
independently uniformly distributed on [0, 1]; Zj^i, . . . , Zj^s are normally dis- 
tributed with mean and variance 1; when hiy^h2, the correlation between 
Zj^?-,,j and "Ziifi^ is ctI^i"'*^! with w = or w = 0.5; Z, is independent of Xj^i 
and Xj^2- We set the first two components of /3 to be in all submodels. The 
other three may or may not be present, so we have 2^ = 8 submodels to be 
selected or averaged across, ro varies from 1 or 4 to 7. Our focus parameters 
are (i) /^i = /3i, (ii) fi2 = ^2, (m) Ms = 0.75/3i +0.05/^2 - 0.3/^3 + 0.1^4 -0.06/35 
and (iv) fi^ = rji (0.86) + 772 (0.53) + 0.32/3i - 0.87/32 - 0.33/33 - 0. 15/34 + 0.13/35 . 

The cubic B-splines have been used to approximate the two nonparametric 
functions. We propose to select J„ using a BIG procedure. Based on condi- 
tion (C6), the optimal order of Jn can be found in the range 

(nV{2'^),nV3). 

Thus, we propose to choose the optimal knot number, J„, from a neighbor- 
hood of n^/^'^. For our numerical examples, we have used [2/3Nr,4:/3Nr], 
where = ceiling(n^/^'^) and the function ceiling(-) returns the smallest 
integer not less than the corresponding element. Under the full model, let 
the log-likelihood function be ln{Nn)- The optimal knot number, Nn^^ , is 
then the one which minimizes the BIG value. That is, 

(4.1) iV°P' = arg min {-2Z,(iV„) + g„logn}, 

AT r- ro n AT A O AT 1 



12 



X. ZHANG AND H. LIANG 



where g„ is the total number of parameters. 

Four model selection or model averaging methods are compared in this 
simulation: AIC, BIC, FIC and the smoothed FIC (S-FIC). The smoothed 
FIC weights we have used are 

a case of expression (5.4) in Hjort and Claeskens (2003). When using the 
FIC or S-FIC method, we estimate D~^5]D~^ by the covariance matrix 
of /3fuii and estimate D by its sample mean, as advocated by Hjort and 
Claeskens (2003) and Claeskens and Carroll (2007). Thus, I] can be calcu- 
lated straightforwardly. Note that the subscript "/u//" denotes the estimator 
using the full model. 

In this simulation, one of our purposes is to see whether the traditional 
selection methods like AIC and BIC lead to an overly optimistic coverage 
probability (CP) of a claimed confidence interval (CI). We consider a claimed 
95% confidence interval. The other purpose is to check the accuracy of esti- 
mators in terms of their mean squared errors (MSE) 1/1000 J2j{f^a^ ~ l^a)^ 
for a = 1, ... ,4, where j denotes the jth replication. Our results are listed 
in Table 1. 

These results indicate that the performance of both the FIC and S-FIC, 
especially the latter, is superior to that of AIC and BIC in terms of CP 
and mean squared error (MSE), regardless of whether the focus parameter 
depends on the nonparametric components or not. The CPs based on FIC 
and S-FIC are generally close to the nominal level. When the smallest CPs 
based on S-FIC and FIC are respectively 0.921 and 0.914, the corresponding 
CPs of AIC and BIC are only 0.860 and 0.843, respectively, which are much 
lower than the level 95%. The CPs of both S-FIC and FIC are higher than 
those from full models, but close to the nominal level, whereas the intervals 
of FIC and S-FIC have the same length as those from the full models because 
we estimate the unknown quantities in (3.4) under the full model. 

When ro gets bigger, the MSEs based on S-FIC are substantially smaller 
than those obtained from other criteria. It is worth mentioning that in Tables 
1 and 2, we do not report the CPs corresponding to FIC and S-FIC for 
/i4 because we do not derive an asymptotic distribution for the proposed 
estimators of this focus parameter. 

As suggested by a referee, we now numerically examine the effects of the 
number of knots on the performance of these criteria. We generalize the 
data and conduct the simulation in the same way as above, but oversmooth- 
ing and undersmoothing nonparametric terms by letting = ceiling(n^/^) 
and Nr = ceiling(?i^/^'^), respectively. The results corresponding to under- 
smoothing show a similar pattern as in Table 1. Note that derivatives of 
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all orders of functions rji{Xi^i) and rj2{Xi_2) exist and satisfy the Lipschitz 
condition. Nr = ceiling(n^/^'^) is still in the range (n^^^'^'"\n^^^), so this sim- 
ilarity is not surprising and supports our theory. However, oversmoothing of 
the nonparametric functions causes significant changes and generally pro- 
duces larger MSEs but lower CPs, while all of the results show a preference 
for the S-FIC and FIC. To save space, we report the results with n = 400 
and To = 4 in Table 2, but omit other results, which show similar features 
to those reported in Table 2. 



Table 1 

Simulation results. Full: using all variables; CP: coverage probability; MSB: mean 

squared error 

Ml M2 At3 At4 



zu = = 0.5 zu = ^37 = 0.5 -07 = vj = 0.5 -^7 = = 0.5 



Method 


CP 


MSE 


CP 


MSE 


CP 


MSE 


CP 


MSE 


CP 


MSE 


CP 


MSE 


CP 


MSE 


CP 


MSE 


Full 


0.9 


0.33 


0.9 


0.49 


0.89 


0, 


.49 


0, 


.88 


0.8 


0, 


.9 


0.22 


0, 


.9 


0.32 


0, 


.92 


2.25 


0, 


.91 


2.92 


AIC 


0.91 


0.31 


0.9 


0.46 


0.9 





.45 


0, 


.87 


0.76 


0, 


.89 


0.21 


0, 


.88 


0.3 


0, 


.91 


2.15 





.9 


2.77 


BIG 


0.92 


0.28 


0.9 


0.4 


0.91 





.39 


0, 


.88 


0.71 


0, 


.9 


0.19 


0, 


.88 


0.26 


0, 


.92 


1.98 





.9 


2.66 


FIC 


0.92 


0.28 


0.93 


0.39 


0.92 


0, 


.33 


0, 


.91 


0.79 


0, 


.92 


0.19 


0, 


.92 


0.25 






2 






2.66 


S-FIC 


0.93 


0.28 


0.93 


0.41 


0.93 


0, 


.4 


0, 


.92 


0.68 


0, 


.93 


0.19 


0, 


.92 


0.26 






2 






2.61 


Full 


0.89 


0.35 


0.9 


0.73 


0.9 


0, 


.48 


0, 


.88 


1.16 


0, 


.9 


0.19 


0, 


.91 


0.35 


0, 


.94 


1.79 


0, 


.91 


3.42 


AIC 


0.89 


0.34 


0.9 


0.69 


0.9 


0, 


.47 


0, 


.86 


1.16 


0, 


.89 


0.19 


0, 


.89 


0.35 


0, 


.94 


1.75 





.9 


3.39 


BIC 


0.9 


0.31 


0.91 


0.63 


0.91 





.42 


0, 


.84 


1.17 


0, 


.87 


0.19 


0, 


.87 


0.35 


0, 


.94 


1.67 





.89 


3.4 


FIC 


0.95 


0.19 


0.95 


0.34 


0.94 





.33 


0, 


.93 


0.79 


0, 


.93 


0.14 


0, 


.95 


0.24 






1.52 






2.7 


S-FIC 


0.97 


0.17 


0.97 


0.32 


0.97 


0, 


.22 


0, 


.97 


0.68 


0, 


.96 


0.13 


0, 


.97 


0.22 






1.32 






2.47 


Full 


0.89 


0.46 


0.9 


1.02 


0.89 


0, 


.66 


0, 


.87 


2.04 


0, 


.9 


0.2 


0, 


.92 


0.41 


0, 


.92 


2.26 


0, 


.92 


5.32 


AIC 


0.89 


0.46 


0.9 


1 


0.89 


0, 


.65 


0, 


.86 


2.04 


0, 


.9 


0.2 


0, 


.91 


0.41 


0, 


.92 


2.24 


0, 


.91 


5.28 


BIC 


0.89 


0.44 


0.91 


0.93 


0.9 


0, 


.62 


0, 


.86 


1.92 


0, 


.9 


0.2 


0, 


.88 


0.41 


0, 


.92 


2.18 





.91 


4.79 


FIC 


0.94 


0.21 


0.97 


0.36 


0.94 


0, 


.33 


0, 


.95 


0.79 


0, 


.95 


0.12 


0, 


.97 


0.19 






1.87 






2.98 


S-FIC 


0.97 


0.12 


0.98 


0.22 


0.97 


0, 


.16 


0, 


.98 


0.63 


0, 


.98 


0.09 


0, 


.98 


0.15 






1.24 






2.57 


Full 


0.93 


0.07 


0.92 


0.1 


0.93 


0, 


.11 


0, 


.93 


0.15 


0, 


.93 


0.05 


0, 


.92 


0.07 


0, 


.94 


0.52 


0, 


.94 


0.67 


AIC 


0.94 


0.07 


0.92 


0.1 


0.93 





.1 


0, 


.91 


0.14 


0, 


.93 


0.04 


0, 


.91 


0.07 


0, 


.94 


0.51 


0, 


.93 


0.66 


BIC 


0.94 


0.06 


0.93 


0.09 


0.94 


0, 


.09 


0, 


.91 


0.14 


0, 


.93 


0.04 


0, 


.91 


0.06 


0, 


.94 


0.5 





.93 


0.65 


FIC 


0.94 


0.06 


0.93 


0.09 


0.94 


0, 


.09 


0, 


.94 


0.15 


0, 


.94 


0.04 


0, 


.93 


0.06 






0.5 






0.65 


S-FIC 


0.95 


0.06 


0.93 


0.09 


0.94 


0, 


.1 


0, 


.93 


0.14 


0, 


.94 


0.04 


0, 


.94 


0.06 






0.51 






0.64 


Full 


0.94 


0.07 


0.91 


0.12 


0.93 


0, 


.11 


0, 


.9 


0.19 


0, 


.94 


0.04 


0, 


.91 


0.08 


0, 


.94 


0.54 


0, 


.93 


0.78 


AIC 


0.94 


0.07 


0.92 


0.12 


0.93 


0, 


.11 


0, 


.89 


0.2 


0, 


.94 


0.04 


0, 


.87 


0.08 


0, 


.94 


0.53 


0, 


.92 


0.79 


BIC 


0.94 


0.07 


0.92 


0.12 


0.94 





.1 


0, 


.88 


0.22 


0, 


.92 


0.05 


0, 


.87 


0.09 


0, 


.94 


0.52 


0, 


.9 


0.83 


FIC 


0.95 


0.05 


0.93 


0.09 


0.95 


0, 


.09 


0, 


.92 


0.15 


0, 


.96 


0.04 


0, 


.94 


0.06 






0.49 






0.72 


S-FIC 


0.97 


0.05 


0.95 


0.09 


0.97 


0, 


.07 


0, 


.95 


0.16 


0, 


.97 


0.04 


0, 


.94 


0.06 






0.46 






0.69 


Full 


0.92 


0.08 


0.9 


0.14 


0.93 


0, 


.11 


0, 


.91 


0.21 


0, 


.94 


0.04 


0, 


.91 


0.08 


0, 


.94 


0.52 


0, 


.93 


0.82 


AIC 


0.92 


0.08 


0.9 


0.14 


0.92 


0, 


.11 


0, 


.91 


0.21 


0, 


.94 


0.04 


0, 


.89 


0.08 


0, 


.94 


0.51 


0, 


.93 


0.81 


BIC 


0.93 


0.08 


0.91 


0.13 


0.93 





.11 


0, 


.89 


0.22 


0, 


.94 


0.04 


0, 


.86 


0.09 


0, 


.94 


0.5 


0, 


.92 


0.82 


FIC 


0.94 


0.06 


0.92 


0.1 


0.93 


0, 


.09 


0, 


.93 


0.15 


0, 


.94 


0.04 


0, 


.93 


0.07 






0.47 






0.68 


S-FIC 


0.95 


0.05 


0.96 


0.07 


0.95 


0, 


.06 


0, 


.96 


0.12 


0, 


.96 


0.03 


0, 


.96 


0.05 






0.38 






0.6 



200 1 



400 1 
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5. Real-world data analysis. In this section, we apply our methods to a 
data set from a Pima Indian diabetes study and perform some model se- 
lection and averaging procedures. The data set is obtained from the UCI 
Repository of Machine Learning Databases and selected from a larger data 
set held by the National Institutes of Diabetes and Digestive and Kidney 
Diseases. The patients under consideration are Pima Indian women at least 
21 years old and living near Phoenix, Arizona. The response variable, Y, tak- 
ing the value of or 1, indicates a positive or negative test for diabetes. The 
eight covariates are PGC (plasma glucose concentration after two hours in 
an oral glucose tolerance test), DPF (diabetes pedigree function), DBF [di- 
astolic blood pressure (mm Hg)], NumPreg (the number of times pregnant), 
SI [two-hour serum insulin (mu U/ml)], TSFT [triceps skin fold thickness 
(mm)], BMI (body mass index [weight in kg/(height in m)^]) and AGE 
(years). We then consider the following GAPLM for this data analysis: 

logit{Pr(y = 1)} = T]i{BMI) + T]2{AGE) + piPGG + ^2DPF 

+ p^DBP + piNumPreg + /S^SI + Pq TSFT, 

where AGE and BMI are set in nonparametric components and the following 
Figure 1 confirms that the effects of these two covariates on the log odd 
are nonlinear. All covariates have been centralized by sample mean and 
standardized by sample standard error. 

We first fit the model with all covariates using the polynomial spline 
method introduced in Section 2. The cubic B-splines have been used to ap- 
proximate the two nonparametric functions. The number of knots was chosen 

Table 2 

Simulation results of overfitting with n = 400 and ro = 4 



Ml 





■UJ 


= 




= 0.5 


1X7 


= 




= 0.5 


Method 


CP 


MSE 


CP 


MSE 


CP 


MSE 


CP 


MSE 


Full 


0.864 


0.131 


0.852 


0.232 


0.852 


0.211 


0.840 


0.365 


AIC 


0.869 


0.129 


0.863 


0.226 


0.851 


0.207 


0.805 


0.381 


BIG 


0.884 


0.117 


0.872 


0.210 


0.863 


0.186 


0.770 


0.409 


FIG 


0.942 


0.086 


0.917 


0.154 


0.922 


0.131 


0.874 


0.300 


S-FIG 


0.952 


0.081 


0.932 


0.149 


0.946 


0.123 


0.916 


0.300 






M3 














Full 


0.884 


0.073 


0.863 


0.138 


0.928 


1.055 


0.910 


1.548 


AIG 


0.874 


0.073 


0.813 


0.142 


0.931 


1.053 


0.909 


1.571 


BIG 


0.863 


0.077 


0.782 


0.152 


0.929 


1.028 


0.897 


1.606 


FIG 


0.914 


0.060 


0.915 


0.107 




0.967 




1.443 


S-FIG 


0.949 


0.064 


0.921 


0.110 




0.910 




1.361 
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Table 3 

Results for the diabetes study: estimated values, 
associated standard errors and P-values obtained using 
the full model 





Estimated value 


Standard error 


P-value 


PGC 


1.1698 


0.1236 


0.0000 


DPF 


0.3323 


0.1029 


0.0012 


DBP 


-0.2662 


0.1040 


0.0104 


NumPreg 


0.1887 


0.1209 


0.1184 


SI 


-0.1511 


0.1078 


0.1610 


TSFT 


0.0179 


0.1135 


0.8749 



using the BIC, presented in (4.1). The fitted curves of the two nonpar ametric 
components rji[BMI) and i]2{AGE) are depicted in Figure 1. The estimated 
values of the /3j's, their standard error (SE) and corresponding z- values are 
Hsted in Table 3. The results indicate that PGC and DPF are very signif- 
icant, while the other four seem not to be, so we run model selection and 
averaging on these four covariates. Accordingly, there are 2^ = 16 submodels. 

We now consider four focus parameters: = 1^2 = P2, IJ-s = ?7i(— 1-501) + 
r/2 (0.585) + 0.028/3i - 0.899/32 - 1.570/33 + 1.087/34 - 0.223/35 - 0.707/36 and 
/i4 = ?7i (-0.059) +772(1. 363) + 0.994/3i +0.423/32+0.645/33 + 1.117/34 - 0.221/35 + 
0.055/36. The first two are just the single coefficients of PGC and DPF, the 
so-called two most significant linear components. The second two are related 
to the nonparametric terms. Specifically speaking, fi-^ represents the log odd 
at BMI = 22.2, the lowest point of the estimated curve in the left panel of 
Figure 1, and the corresponding means of other predictors when BMI = 22.2, 




n 1 1 1 r I 1 1 1 1 r 

20 30 40 50 60 20 30 40 50 60 70 



BMI age 

Fig. 1. The patterns of the nonparametric functions of BMI and AGE (solid lines) with 
±SE (broken lines). 
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while ^4 represents the log odd at AGE = 49, the highest point of the esti- 
mated curve in the right panel of Figure 1, and the corresponding means of 
other predictors when AGE = 49. We label the potential 16 submodels "0," 
"3," "4," "5," "6,"..., "3456" corresponding to a submodel which includes 
(or not) DBF, NumPreg, SI and TSFT. The results based on AIC, BIC and 
FIC methods are presented in Table 4. Regardless of focus parameter, the 
AIC and BIC select submodels "345" and "3," respectively. On the other 
hand, the FIC prefers submodels "3," "34," "345" and "5" when the focus 
is on fii, fj,2, /Us and /i4, respectively. It is noticeable that submodel "36" is 
also competitive for fii. We are inclined to use submodel "3" since it has 
fewer parameters. 

We further examine the predictive power of above model selection and 
averaging methods through a cross-validation experiment. For each patient 
in the data set, we use the AIC, BIC, FIC and S-FIC to carry out estimations 
based on all of the other patients as a training sample, and then predict the 
left-out observation. The prediction error ratios (the ratio of the number of 
mistaken predictions to the sample size) corresponding to AIC, BIC, FIC 
and S-FIC are 0.228, 0.225, 0.221 and 0.221, respectively. Both FIC and 
S-FIC show smaller prediction errors than those of AIC and BIC, although 
the differences among these errors are not substantial. These results indicate 
the superiority of the FIC and S-FIC to the AIC and BIC. 

6. Discussion. We have proposed an effective procedure using the poly- 
nomial spline technique along with the model average principle to improve 
accuracy of estimation in GAPLMs when uncertainty potentially appears. 
Our method avoids any iterative algorithms and reduces computational chal- 
lenges, therefore its computational gain is remarkable. Most importantly, the 
estimators of the linear components we have developed are still asymptoti- 
cally normal. Both theoretical and numerical studies show promise for the 
proposed methods. 

GAPLMs are generally enough to cover a variety of semiparametric mod- 
els such as partially linear additive models [Liang et al. (2008)] and gener- 
alized partially linear models [Severini and Staniswalis (1994)]. It is worth 
pointing out that GAPLMs do not involve any interaction between non- 
parametric components (which may appear in a particular issue) and thus 
our current methods do not deal with this situation. We conjecture that 
our procedure can be applied when the interactions may also be included in 
the model search through tensor polynomial spline approximation, but this 
extension poses additional challenges. How to develop model selection and 
model averaging procedures in such a complex structure warrants further 
investigation. 



Table 4 

Results for the diabetes study: AIC, BIC and FIC values, and estimators of focus 

parameters 










3 


4 


5 


6 


34 


35 


36 


45 


46 


56 


345 


346 


356 


456 


3456 


AIC 


717, 


,2 


712.1 


716.7 


716.5 


718.3 


711.5 


711.8 


713.8 


716.1 


717.8 


718.4 


711.3* 


713.2 


713.7 


718.0 


713.3 


BIC 


791. 


,3 


790.7* 


795.4 


795.2 


797.0 


794.7 


795.0 


797.1 


799.4 


801.1 


801.7 


799.2 


801.1 


801.6 


806.0 


805.8 


Mi-FIC 


11, 


,58 


9.86* 


13.69 


11.07 


11.4 


10.97 


11.74 


9.86 


11.63 


13.41 


11.36 


11.16 


10.96 


12.17 


12.08 


11.54 


Ai 


1, 


,09 


1.11 


1.09 


1.15 


1.092 


1.11 


1.17 


1.11 


1.15 


1.09 


1.15 


1.17 


1.11 


1.17 


1.15 


1.17 


M2-FIC 


7, 


,87 


7.83 


7.66 


7.77 


7.95 


7.58* 


7.77 


8.07 


7.79 


7.93 


7.97 


7.80 


8.00 


7.98 


8.01 


7.99 


A2 


0, 


,31 


0.31 


0.31 


0.33 


0.32 


0.32 


0.33 


0.32 


0.33 


0.33 


0.33 


0.33 


0.32 


0.33 


0.34 


0.33 


M3-FIC 


261, 


,5 


51.9 


143.7 


245.8 


219.1 


38.0 


48.7 


47.9 


144.7 


122.5 


235.1 


37.7* 


38.8 


51.0 


140.7 


38.7 


A3 




,62 


-2.23 


-2.57 


-2.70 


-2.66 


-2.17 


-2.30 


-2.26 


-2.65 


-2.61 


-2.70 


-2.24 


-2.19 


-2.29 


-2.65 


-2.23 


M4-FIC 


10, 


,56 


53.98 


24.17 


4.22* 


9.82 


30.70 


30.08 


51.82 


35.28 


23.98 


6.02 


31.38 


30.71 


30.10 


35.43 


31.93 


A4 


1, 


,63 


1.59 


1.66 


1.73 


1.61 


1.62 


1.68 


1.58 


1.75 


1.64 


1.71 


1.71 


1.61 


1.69 


1.74 


1.71 



* denotes the minimal AIC, BIC or FIC values of the corresponding row. 
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APPENDIX 

Let II • II be the Euclidean norm and 11971100 = sup^ lv^('^)l be the supremum 
norm of a function 93 on [0, 1]. As in Carroh et al. (1997), we let qi{m,y) = 

^ '^%n!i^^'^'' ' ^^^^ =dQ{g-^{m),y}/dm = {y - g'^ (m)} pi{m) and 

q2{rn,y) = d'^Q{g^^{m),y} / dm^ = {y - g''^ {m)} p'^{m) - p2{m). 

A.l. Conditions. Let r be a positive integer and u G (0, 1] be such that 
V = r + V > 1.5. Let % be the collection of functions / on [0, 1] whose rth 
derivative, f^^\ exists and satisfies the Lipschitz condition of order u; that 
is, 

|/('^)(7n*) - f^''\m)\ <Ci\m* - for < m*, ?n < 1, 

where C\ is a generic positive constant. In what follows, c, C, c, C. and C* 
are all generic positive constants. The following are the conditions needed 
to obtain Theorems 1-3: 

(CI) each component function ?7o,a G ^, a = 1, . . . ,p; 

(C2) q2im,y) < and Cg < \q2{m,y)\ < Cg for m £ R and y in the range of 

the response variable; 
(C3) the function r?o(') continuous; 

(C4) the distribution of X is absolutely continuous and its density / is 
bounded away from zero and infinity on [0, 1]^; 

(C5) £;(ZZ^|X = x) exists and A = E[p2{mo{T)}ZZ^] is invertible, al- 
most surely; 

(C6) the number of interior knots n^/^^") <ti Jn ^ n^^'^; 

(C7) lim„^oo n. 2^1=1 ( z-BT(x,) ZiZT ) ^^I'^^s and is nonsmgular; 

(C8) for pi introduced in Section 2, |pi(mo)| < Cp and 

\pi{rn) — pi(mo)| <C*\m — mo| for all \m — mo| < Cm'-, 

(C9) the matrix D is invertible almost surely; 
(CIO) the link function g in model (2.1) satisfies \-£^g{fn)\m=mo \ < Cg and 

j-g^^iin) - -^g^^im) 
dm dm 

<C*\m — mo\ for all \m — mo\ < C^', 

(Cll) there exists a positive constant C^ such that £'(e^|T = t) < Cg almost 
surely. 
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A. 2. Technical lemmas. In the following, for any probability measure P, 
we define L2(P) = {f : J f dP < 00} . Let be a subclass of L2{P). The 
bracketing number A/[](r, J^, L2(-P)) of T is defined as the smallest value of 
N for which there exist pairs of functions {[/j", with \\fj^ — fj"\\ < r, 

such that for each f ^ J-', there exists a j G {1, . . . , N} such that fj"<f< ■ 

Define the entropy integral (t, J^, L^iP)) = /J" y^l + \ogM\T^{i,F , L2{P)) di. 

Let Pn be the empirical measure of P. Define G„ = ^/n{Pn — P) and ||G„ || j- = 
supjgj- \Gnf\ for any measurable class of functions F. 

We state or prove several preliminary lemmas first. Lemmas A.1-A.3 will 
be used to prove the remaining lemmas. Lemmas A.4-A.5 are used to prove 
Theorem 1. Theorems 2-3 are obtained from Theorem 1. 

Lemma A.I [Lemma 3.4.2 of van der Vaart and Wellner (1996)]. Let 

Mq be a finite positive constant. Let F be a uniformly bounded class of 
measurable functions such that Pf^ < and \\f\\oo < Mq. Then 

f MT,F,L2iP)) 1 
ij^p||Gn||.F<CoJ[](r,J',L2(P))|l+ " Mo^ 

where Co is a finite constant not dependent on n. 

Lemma A. 2 [Lemma A. 2 of Huang (1999)]. For any t > 0, let 6„ = 
{7?(x) + z^P; 11/3 - /3o|| <T,ri£ [[r? - r7o||2 < t}. Then, for any l < r, 
logA/'[](i,, 0ji,L2(-P)) < co(Jn + Q)\ogT/i, whcrc Co is a finite constant not 
dependent on n. 

Referring to the result of de Boor [(2001), page 149], for any function 
/ G and n > 1, there exists a function f £ Sn such that ||/ — /||oo < 
Ch'", where C is some fixed positive constant. From condition (CI), we can 
find 75 = {7sj,o, j = —Q + 1, . . . , Jn, a = 1, . . . and an additive spline 
function rjs = 75B(x) G Gn such that 

(A.I) ||^5-%||oo=0(/l''^). 

Let = argmax i ^ti Q[9~Hvs(.^'d + {'n-sZi)'^ (3s},Yi], mo,i = mo{Ti) = 
r/o(Xi) + Zj f3o and ms,^ = ms{Ti) = Vsi^i) + Zf/3o = ls^(.^^) + ^I^o- 

Lemma A. 3. Under the local mis specification framework and conditions 
(C1)-(C6), 

(A.2) V^U^iPs - T^sPo) - np5 ^ iV(0, A-^SiA-i), 

where 5 consists of the elements of 5 that are not in the Sth submodel, 
TTs is the project matrix mapping 5 to 6, Us = [0(d„-d„ s)xdc, ^fs] and Ei = 
E[ql{moiT)}ZZ^]. 
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Proof. Let ^9 = y/nU§{/3s - ^sl3o) - ^s^s and i? = y/EU^{l3s - UsPq) - 
n|^s. Note that maximizes ^ EHi Qb^H^s(X,) + (UsZ,)^ Ps},Yi\, so 
maximizes 

n 

'in {^) = J2 iQ{9^'i^s,i + n-'/^^^Zi), Yi} - Q{g^\ms,i), Yi}] . 

i=l 

By Taylor expansion, one has ini'd) = XlILi ^O^^^Zj + i-i?'^A„^?, 

where A„ = ^ ELii^iP'i ("^5,* + Cm) - P3{fnoi + CJjZjZ^ with C™ and Qi 
both lying between and n~"^/^??^Zj, and P3{m) = g"^ {m) p\{m) — p2{m). 
From the proof of Theorem 2 in Carroh et al. (1997), A„ = -E[p2{mo{T)}ZZ^] + 
Op(l) = —A + Op(l) and 

^ n 1 

— y'q'i(msi,yi)Zi = — y'gi(mo,i,li)Zi 

V"^ V ^ 

1=1 ^ 1=1 

1 

+ ^^g2(?no,j,li)te(Xi) - 77o(Xi)}Zi 



In addition, by (A.l) and conditions (C2), (C5) and (C6), we have 



n 



-1/2 



^g2(mo,i, yOZ^{^5(Xi) - r/o(Xi)} = Op(ni/2/i") = 0^(1) 



i=l 



Therefore, by the convexity lemma of Pollard (1991) and condition (C5), 
one has d = A-^n-^/^ ^^^^ gi(mo,i, ^02^ + 0^(1) and var{gi{mo(T), y}Z} = 
E[q1{mo{T),Y}ZZ^] = Si, so (A.2) holds. □ 

Define an,h = h^ + [n^^ logn) ^2, Qs = (7^, /^J )^, ^5 = (Ts > and ^5 = 

(tJ,^)^. 

Lemma A. 4. Under the local misspecification framework and conditions 
(C1)-(C8), one has \\6s - 6s\\ = 0^{jV^ a.^^^)- 

Proof. Note that 



(A.3) 



dues 



--ds 



dOsdel 



{0s -0s), 



8.3=8.5 



with 9s lying between 9s and 9s- Recalling the equation (2.5), one has 



dini9s) 



83=83 



dini9s)\' fdln{9s 



dps 



Tn t 



83=83 
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where 



1 " 

= -X^gi(?7T.o,i,li)B(Xt) 
1 " 

+ -E«2te>^0{^(Xi)-%(X.)}B(X,) 

n ^ — ^ 



1 " 



i=l 



and 



es=Ss 



1 " 

qi{mQi,Yi)UsZi 

i=l 

1 " 

+ - V(?2(4*,^.){^(X.) - r?o(Xi)}n5Z, 
n 

i=l 

1 " 



i=l 



with and ^* both lying between mo,i and m^^j. According to the Bernstein 
inequahty and condition (C8), 



1 

-Vgi(mo,„yi)B(X,^ 



i=l 



1 

max — 

-£i+l<i< J,l<a<p n 



i=l 



= Op{(n-^ logn)i/2}. 
And, by (A.l), Lemma A. 3 and condition (C2), one has 
1 " 

- Y] \\q2{^^,Yi){?js{y.i) - ?7o(X,)}B(XO||oo = Op(/l") 
1=1 

and 
1 " 

- 5^lk2te,>^*){n|(/35 - Usf3o) - np5/^/^}^z,B(x,)lL = Op(n^'/') 



Therefore, ||9£n(^5)/57|g^^g^||oo = Op{an,h)- Similarly, we can prove 



dues) 




dl3s 


es=9s 



Op(/i" + (n-ilogn)i/2). 
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Thus, 
(A.4) 



diniOs) 




d9s 


ds=es 



■ Op{an,h)- 



Let ms^i = ms{Ti) = 0"'"(B^(Xj), (IlsZj)"'")"'^. For the second order deriva- 



tive, one has 



desdej 



es=es 



/ dHnjes) denies) \ 
djd/3] 



-'S2q2{ms,uYi) 
n ^ 



i=l 



9s=es 

B(Xi)BT(XO B(Xi)ZT 
ZjB'^(Xj) ZjZ^ 



by which, along with conditions (C2) and (C7), we know that the matrix 
%es^d^^ 1 6s =65 nonsingular in probabihty. So, according to (A. 3) and (A.4), 
we have completed the proof. □ 

Define M.n = {'7i(x,z) = 7?(x) + z^l3:r] G Qn} and a class of functions 
A{t) = {pi{m{t))'4>{t) : m £ Mn, \\m - mo\\ < r}. 

Lemma A. 5. Under the local misspecification framework and conditions 
(C1HC8), we have 

1 " 

(A.5) - V{%(X,) -r7o(Xi)Vi(mo,i)V(T,) = Op(n-i/2), 

i=l 

1 " 

(A.6) -J^pi(?no,.)V(T,)r(Xi)^n|(/3s-ns/3o) = Op(n-i/2). 



i=l 



Proof. Noting that V a-nd pi are fixed bounded functions under con- 
dition (C8), by Lemma A. 2, similar to the proof of Corollary A.l in Huang 
(1999), we can show, for any l<t, logA/[](<-,^(r), || • ||) < co{{Jn + g) log(r//,) + 
log(;.~'^)), so the corresponding entropy integral satisfies J[](r,^(r), || • ||) < 
cot{(J„ + £>)-^/^ + (logr~-^)^/^}. According to Lemma A.4, ||??5 — ^^sUl = (75 — 
75)^Er=i^{B(X0BT(X,)}(75 - 75)/n<C7||75-75|li,thus||^5-^5||2 = 

"1^/9 1/2 

Op{Jn an,h) and ||r?5-??o||2 < Wis -ilsh + Wils -m\\2 = Op{Jn an,h)- Now, 
by Lemma 7 of Stone (1986), 



(A.7) 



- %||oo 



< CsJ^^llrys - ??o||2 = Op{Jnan,h)- 
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Thus, by Lemma A.l, together with conditions (CI) and (C6), we have 

E 



- ^{Vs{^i) - '?o(Xi)}/5i(mo,i)^(Ti; 

i=l 

£;[{r?s(X)-r?o(X)Vi{mo,.V'(T)}] 



:o(n-i/2) 



In addition, by the definition of -E[(/)(X)/9i{?tio(T)}i/;(T)] = for any mea- 
surable function (j). Hence (A. 5) holds. Similarly, (A. 6) follows from Lemmas 
A.1-A.4. □ 

A.3. Proof of Theorem 1. Let fhs,i = rns(Tj) = rjsi'Ki) + 'j^Iis'Z^i- For 
any v G Rdc+du.,s ^ define fhs{^) = m5(x,n5z) + v'^{n5Z - ns'r(x)} = 
m5(x, n^z) + v'^n5"i/)(t). Note that when v = 0, fhs{^) maximizes 
Er=i Q[g^^{ms{T^)},Y,] for ah ms G {m^Cx, z) = r?(x) + {lisz)^ Ps ■ V e 
Qn}, by which 



= ^4(m5(v)) 



v=0 



(Ai 



1 " 

- J^{y, - (7"Hm5,i)}/5i(^5,i)nsV'(Ti)- 

i=l 

^ n 1 " 

-^gi(mo,i,li)ns^(Ti) + -^ei{pi(m5,i)-/Oi(mo,i)}nsV'(Ti) 

i=l i=l 
1 

- y^{5~"^("i5,i) - 9~^{mQ^i)}pi{fhs,i)I\-s'>P{T^i) 



1=1 

I + 11- III. 



Note that for the second term £^[ej{pi(m5^j) — pi{mQ^i)}Ils'<p{Ti)] = 0. From 

^ 1/2 

Lemma A. 3, (A. 6) and (A. 7), we have \\ms — mo||oo = Op{Jn an,h), so, by 

^ 1/2 

condition (C8), \\pi{ms) — Pi{rnQ)\\ao = Op{Jn CLn^h)- Now, by the Bernstein 
inequality, under condition (Cll), we show that 



(A.9) 



1 

// = - V ei{pi {ms,i) - pi (mo,i)}n5^(Ti) = Op{n-''''). 



-1/2^ 



Express the third term as 
1 " 



i = \ 
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1 " 

= - y^(ms,i - mo^i)pi{mo^i)Usip(Ti) 
1=1 

1 " 

+ - yZ{9~^{'fns,i) - {mo,i) - {fhs,i - mo i)}pi {mo.i)'n.sip{Ti) 
1=1 

1 " 

n ^-^ 

1=1 

= nil + III 2 + Ilh. 

From Lemma A. 5, a direct simplification yields 
1 " 

nil = - J^{%(Xi) + ^n^z, - r/o(x,) - 0^z,]pl{mo,^)Iisi^{^:i) 
1=1 

1 " 

= - E{^5(X0 - %(X.) + (n|/35 - /3o)^V'(Ti) 
1=1 

+ {UlPs - /3o)^r(Xi)}pi(mo,0nsV'(Ti) 

1 " 

= -y"{%(Xi) -%(Xi)}/9i(mo,i)n5V'(T,) 



1=1 



+ -Y,pi{mQ,i)Ils^{T{)^{T{f llilPs ■ 
^ i=i 

1 " 

--Vpi(mo,i)nsV'(T.)V'(Tir[0,/]^V^^ 
1=1 

1 " 

+ - J^Pi(mo,onsV'(Ti)r(Xi)^n|(/35 - n^A 

^ i=i 

1 " 

+ -J^Pi(mo,0n5^(T0r(Xi)^n|(-55/v^) 



/3c,o 



1 " f- 
- J^pi(mo,i)n5V'(Ti)V'(Tirn| 



n 

i=l 



'c,0 



11 

Vpi(mo,i)n5^(T,)V'(Ti)^[0, + . 
Vn n ^-^ 

* 2=1 
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In addition, from conditions (C8) and (CIO), referring to the proof of (A. 5), 
we have 1112 = Op{n~^/'^) and III^ = Op{n~^/'^). Therefore, 

/3c,o 




III=[E{pi{m^)Ils^{T)^{Tflil} + Op{l)] \ Ps 

(A.IO) 



L[E{pi{mo)UsiPmiTf[0,lf} + Opil)]6 + Op{n-'/^) 



Thus, by combining (A. 8), (A. 9), (A.IO) and condition (C9), the desired 
distribution of /3s fohows. 

A. 4. Proof of Theorem 2. By the Taylor expansion, p,Q = p(f3cfl,d/y/n) = 
fi{Pc,o,0) + fil5/^+ o{n~^/'^) and 

= M/3e,o,0) + pj {u^sPs - ( V) } + 

where the second equation follows from the asymptotic normality of Ps- 
Thus, by Theorem 1, 

^{ps-po) = /ij|n|^5- (^o'°)}-/^n'^ + Op(l) 

= -/ijRsGn + pjKs'D (^^^^ - pl5 + opil) 

A-/ijR5G + /xJ(RsD-/) 

Thus, the proof is complete. 

A.5. Proof of Theorem 3. Recalling the definitions of 115 and R5, we 

have 



which, along with the definition of 6 and Theorem 2, indicates that 

y/n{p-po) = ^w{S\5)y/n{ps - Pq) 

s 

= ^u;(5|5)|-/ijR5G„ + MjR5D(°) - p]^5 + Opil)'^ 



(1) 
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-^lJJ2w{S\6)Ks'Dr^ 



I 



D^G 



n 



^^,JY,w{S\6)Rs^D(l)-^^l{6+[0,I]B~'Gn) + Op{l) 
-^Jd-G + ;.J{q(A)(0)-(0)} 



and thus the proof is complete. 
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